arXiv: 1507.06920v 1 [q-bio.PE] 24Jul2015 


The long-tail distribution function of mutations in bacteria 

Augusto Gonzalez 

Instituto de Cibernetica, Matematica y Fisica, La Habana, Cuba 

Levy flights in the space of mutations model time evolution of bacterial DNA. Parameters in the 
model are adjusted in order to fit observations coming from the Long Time Evolution Experiment 
with E. Coli. 

PACS numbers: 61.80.Hg, 87.53.-j, 87.23.Kg 


The Long Time Evolution Experiment. I recall 
the extremely interesting experiment with E. Coli, con¬ 
ducted by Prof. R. Lenski and his group m, and run¬ 
ning already for more than 27 years. Among the reported 
results, I use the following [3] : 

1. In a culture of bacteria, after 20,000 generations, 
around 3 x 10 8 single point mutations in the DNA are 
registered. These are local modifications of the DNA 
chain. I notice that the number of bacteria undergoing 
continuous evolution is around 5 x 10 6 . 

2. They measure also the frequency of mutations in¬ 
volving rearrangements in segments of the DNA. In par¬ 
ticular, mutations in which the repair mechanisms are 
damaged and the mutation rate increases 100 times. This 
mutator phenotype becomes dominant in two of twelve 
cultures (probability 1/6) after 2500 - 3000 generations, 
in a third culture (cumulative probability 1/4) after 8,500 
generations, and in a fourth culture (cumulative proba¬ 
bility 1/3) after 15,000 generations. 

The purpose of my paper is to present a model for 
mutations in bacteria and to adjust the model parameters 
in order to qualitatively fit these data. 

The accumulative character of mutations. 

In my model, the time evolution of cells defines trajec¬ 
tories, as schematically represented in Fig. [l] where two 
of these trajectories are drawn in red. 



FIG. 1. Schematic representation of the evolution of bacte¬ 
ria in the Long Time Evolution Experiment. Every day, the 
cells experience a clonal expansion in which the initial num¬ 
ber N ce ii % 5 x 10 6 is raised 100 times. However, only N ce u 
bacteria pass to the next day. The evolution trajectories of 
two cells are marked by red dashed lines. 


The idea about trajectories in the evolution of cells 
means that there are Markov chains [4] of mutations, 
where the change in the DNA of a cell at step i - hi, 
Xi+ 1 , comes from the change in the previous step plus an 
additional modification: 

x i+1 =Xi + S (1) 

Horizontal DNA transfer is not considered. 

Measuring changes in the DNA A single strand of 
E. Coli DNA contains around 4.6 x 10 6 bases of a four 
letter alphabet: A, G, C, and T. |5 In order to measure 
changes in the DNA, one may use a variable similar to 
that one of paper [61. 

First, define an auxiliary variable at site a in the 
molecule: u a (G ) =3/8, u a (A ) = 1/8, u a (T ) = —1/8, 
and u a (C) = —3/8. Then, define a walk along the DNA: 

P 

y(P) = 'L2 u <*- ( 2 ) 

a=l 

As a function of /?, the variable y draws a profile of 
the DNA molecule, and modifications can be measured 
as: X(/3) = y(/3 ) — yo(/3). where y correspond to the 
mutated DNA, and yo - to the initial configuration. Of 
course, there are so many X(/3), five millions, that they 
are not of practical use. The strategy could be to use 
variables measuring global changes or distances to the 
original function: 

L 

X = Y / «~^), (3) 

a=l 

L 

X (1 > = ]Ta«- Ua ), (4) 

a=l 

X (the second moment), etc. L is the length of the 
molecule. The Shannon informational entropy [7 could 
also be of use. 

In what follows, I shall assume that mutations are well 
characterized by a few global variables. 

Levy model of mutations The S term in Eq. 0 
represents mutations at step i - hi. It may come from a 
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partially repaired damage in the DNA that is fixed af¬ 
ter replication, or from a prune error in the replication 
process. It should be stressed that both the repair mech¬ 
anisms and the replication process guarantee very high 
fidelities. The error introduced by the latter, for exam¬ 
ple, is around one mistaken base per 10 9 bases in the 
human DNA strand [8]. 
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FIG. 2. Schematic representation of a single cell mutation tra¬ 
jectory. The starting point is X = 0. In the mutation space, 
I distinguished regions in which the DNA repair mechanism 
is active or damaged. 

Let me stress once again that 5 is not the damage 
caused by endogenous or external factors, but the re¬ 
sulting modification after the action of the repair mech¬ 
anisms. It is known, for example, that ionizing radiation 
may cause double strand breaks in the DNA 0. These 
damages are very difficult to repair [8]. The repair mech¬ 
anism itself may introduce large changes in the resulting 
DNA composition after a double strand break event. 

My proposal for 5 is the following: 5 = 5 b + 5lj- The 
5b component corresponds to a Brownian motion with 
maximal amplitude Db • Notice that Db = 1 would mean 
roughly a change of basis in each replication step because 
u a (G ) — u a (C ) = 3/4. This Brownian motion introduces 
local modifications in the DNA. After N step replication 
steps, the characteristic dispersion of a trajectory due to 
this Brownian motion (something like the radius of the 
colored region near the origin in Fig. 2) is DB^/X step . 

HO] 

The large-jump component of 5 , 5lj, on the other 
hand, is modeled with the help of rare events with total 
probability p « 1, and a probability density propor¬ 
tional to 1 j5\j, where the amplitude ranges from Db to 
infinity (in practice, I will introduce a cutoff, D max ). The 
combination of the Brownian motion and the large am¬ 
plitude jumps leads to Levy flights HQ in the mutation 
space, schematically represented in Fig. 2. 

Let me notice that the distribution function associated 
to Levy flights is a fat- or long-tail one. This fact could 
be related to the long range correlations observed in the 
walks along the DNA [6]. 

The long-tail distribution function of muta¬ 
tions. Four parameters enter my oversimplified Levy 


model of mutations: N ce u , N step , Db and p. As men¬ 
tioned above, N ce u = 4.6 x 10 6 . On the other hand, N step 
is the number of replication steps along a trajectory. 

Db is the amplitude of the Brownian motion. It shall 
be determined from the observed number of single point 
mutations (SPM) after 20,000 generations. The number 
of SPMs per bacteria is 3 x 10 8 /(4.6 x 10 6 ) ~ 65. The 
characteristic dispersion of the trajectory, on his side, is 
the Brownian radius, yj N step Db ~ 140 Db . In order 
to estimate que equivalent number of SPM, I divide the 
latter by the mean deviation involved in a SPM, that is 
5/12. Notice that u(G)—u(A) = 1/4, u(G) — u(T) = 1/2, 
etc. Thus, 65 = 140 D B /( 5/12), and D B « 0.19. 

Finally, the parameter p is fixed to 1.3 x 10“ 5 Below, 
I shall come back to the way of determining it. 

In the simulations, all of the N ce u trajectories start at 
X = 0. In any replication step, mutations are given by 
Eq. 0 , where 5 contains both the Brownian and the 
large-amplitude components. 

The probability distribution function for mutations in 
a cell, P(X), is the probability that a cell arrives at 
the end point with an amplitude X. For convenience, 
I compute not P(X), but the cumulative probability dis¬ 
tribution, P( \X\ > Z), which is shown in Fig. [3] for 
X step = 3000. 



FIG. 3. The average cumulative probability of mutations, 
P( \X\ > Z), for a single bacterium after 3000 generations. 
Points come from the numerical simulations, whereas the red 
solid line is a 1/Z fit to the tail. The Brownian radius, 
Db\JX s tep , is marked by a dashed line. 

The Brownian radius, yjN step Db ~ 10.4, concentrat¬ 
ing most of the points, is apparent in the figure. In ad¬ 
dition, the tail can be fitted to a 1/Z dependence. The 
coefficient is roughly N step DB p . 

The data on the mutator phenotype is to be used in 
order to fix the slope in the tail. I assume that the repair 
mechanisms are related to a coding region in the DNA 
of length /. The mechanisms are damaged when this 
region suffers modifications greater than a given X u . The 
cumulative probability can be estimated as N ce u P( \X\ > 
X u ). Using the functional dependence in the tail, I get: 
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Cum. Prob. « N cell ReeeRi 1 1 2 3 4 5 6 7 = a N cell N step . (5) 
A n T 

So far, I do not have precise values for l and X u . Rea¬ 
sonable numbers are l/L ~ 10 -2 , X u /L ^ 10 -3 . From 
the observed probabilities, I get a ~ 5.4 x 10 —12 , as shown 
in Fig. [ 4 J from which it follows that p = 1.3 x 10 -5 . 

The asymptotic formula for events in the tail of the 
distribution, Eq. ft is valid no matter how precise are 
l and X unre p. 



FIG. 4. Cumulative probability of the mutator phenotype 
in the Long Time Evolution Experiment. The line is a fit 
according to Eq. 

Mutations and natural selection. Let me stress 
that in Fig. [4] probabilities are measured in a set of 


12 cultures. Thus, one expects errors of the order of 
1/a/T2 « 0.3. In addition, Lenski and his group report 
not the occurrence of the mutation, but the moment at 
which the phenotype becomes dominant in a population. 
In this process, natural selection plays a major role. 

In both the DNA-repaired and DNA-unrepaired re¬ 
gions of the mutation space, there exist points with evo¬ 
lutive advantage. These points act as attractors in the 
mutation space. 

Natural selection may be included in my model by in¬ 
troducing a relative fitness parameter, w. m w r = 1 and 
w u apply to regions of radius three around the centers of 
the DNA-repaired and DNA-unrepaired areas. Out of 
these regions, w Q = 0.7. I introduce a clonal expansion 
phase in which the number of cells increases 100 times, 
as in the Lenski experiment, but only N ce u bacteria pass 
to the next step. The bacteria are selected according to 
the conditional probability w/(w 0 + w r + w u ). Results 
are to be published elsewhere m- 

Levy model of cancer. With appropriate parame¬ 
ters, my Levy model can also be applied to mutations in 
stem cells and, in particular, to the analysis of lifetime 
cancer risk in different tissues m with the help of a for¬ 
mula like Eq. ([ 5 ]). Results are to be published elsewhere. 

m 

I would like to stress only the intriguing fact that in 
cases, like the ovarian germinal cell cancer, where phys¬ 
ical barriers act as protection, and the action of the im¬ 
mune system is partially depressed, the slope a takes val¬ 
ues similar to the number obtained for bacteria. 
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